Assembly Language
©
Copyright Brian Brown, 1988-2000. All rights reserved.
| Notes | Home Page |
This module is the individual work of Brian Brown. It may not be copied or used in any form without his permission.
OBJECTIVE
The study of advanced micro-processor architectures will
aid the student in their understanding of complex systems and enable effecient
software production.
iAPX386 Paging
The processor uses two levels of tables to
translate the linear address from the segmentation unit into the physical
address. The three components of the paging unit are,
CR3 contains the base address of the page directory, whilst CR2 holds the 32bit linear address which caused the last page fault.
The paging scheme looks like,
The page directory is 4kbytes long and has up to 1024 page directory entries. Each entry is a pointer to a page table. A page directory entry looks like,
Each page table is 4kbytes long and has up to 1024 page table entries. Page table entries hold the start address of the page frame.
The determination of a page frame is as follows,
Code/Data separation
In the simple iAPX86 system, data is accessed
using the DS or ES segment registers, whilst code is accessed using the CS
segment register. This system is not foolproof, as the registers can be modified
by any program.
In the iAPX386, descriptors specify either code or data segments. As different selectors are used to access the code and data areas of a task, this ensures separation (though the descriptors can overlay the two areas). Loading segment selector values in protected mode are privledged instructions, thus cannot be executed by user tasks.
The use of permission and access rights prevents unwanted data or code modifications.
Seperate address spaces, Inter-process protection
Each task in an
iAPX386 system is known by its associated descriptor (in the LDT). It has access
and privledge levels associated with it. A task cannot access a descriptors
segment with a higher privledge than it currently has. This prevents one task
trying to access another higher level task like the operating system.
If the task descriptor is not present, any other task cannot access its associated code or data segments. It is possible that a tasks descriptor could be temporarily removed from the descriptor table.
The iAPX386 provides a special mechanism called a call gate which allows a low privledge task to call a higher privledge task (ie the OS).
A task is also allocated its own stack, with a back link to the previous stack frame. This prevents a task from modifying the previous stack. Upon the next task switch, the previous stack frame is restored using the back link. These back link registers are not visible to user tasks.
User/Supervisor protection
The use of two processor modes in the
iAPX386 allows the processor to determine the validity of executing privledged
instructions.
Descriptors have both user and supervisor bits, and privledge levels. As the operating system would run with privledge level 0 in its descriptors, this prevents any lower level task from accessing its code or data segments.
As certain instructions (like modifying descriptors and page entries) are restricted (can only be done in supervisor mode), this prevents users modifying system tables.
Privledged instructions
The 386 processor executes in one of four
privledge levels defined by the CPL level in the EFLAGS registor. At level 0,
all instructions are allowed. When executing at any other level (1-3), the
following instructions cause a general protection exception,
SGDT SIDT STR SLDGT LGDT LIDT LTR LLDGT ARPL LAR LSL VERR VERW LMSW SMSW
In addition, if the LDT is used, each task is assigned a bit map which represents which IO ports it can access. IO instructions to ports which are not allowed cause a general protection exception.
Protection rings and rights
The 386 has four levels of protection,
defined by the PL bits in EFLAGS and segment descriptors. Protection rings are a
concept which have appeared in mainframe operating systems, whereby the software
is divided up into a series of privledge levels,
Enforced interfaces are used to travel from one protection level to another (one ring to another). For instance, a users application requesting an OS service is required to go from ring 3 inward towards ring 1. This is achieved in the 386 by special descriptors called call gates.
iAPX386 Addressing Modes
Batch file used to assemble/link code examples
Register
The operand is located in one of the registers,
mov ax, bx ; transfer bx to ax mov eax, ecx ; 32 bit register transfer on 386
Operand sizes must match
mov eax, bx ; illegal, register sizes different
Immediate
The operand is part of the instruction, and is
expressed as a constant
count equ 20 mov eax, 4fefh ; 32 bit hexadecimal constant mov cx, count ; mov count into cx sub bl, 'A' ; subtract 'A' from bl, leave in bl or bl, 01000000b ; set bit 6 in bl
The remaining addressing modes use a segment:offset pair to specify the operand's effective address.
Direct
The EA of the operand is specified as part of the
instruction. This is a constant, but interpreted as an address, not a
value as in immediate addressing.
eg, MOV AX, DS:[0004h]
assume DS : data2 mov ax, seg data2 mov ds, ax mov ax, ds:[0004h] ; read memory location ds:0004 into ax
The default segment register is DS, but may be over-ridden by using a segment prefix,
mov ax, ds ; register addressing to copy ds into mov es, ds ; segment register es mov ax, es:[0002h] ; ax = 20
It is more correct to use a modifier which specifies the size of the data to be transferred,
mov al, byte ptr es:[000Eh] ; al = 'e' mov cx, word ptr ds:[0000h] ; cx = 10 (Ref test1.asm)
When using the BP register, the default segment register is SS.
Register Indirect
The EA of the operand is specified by the
segment register and an offset contained in a register.
eg. mov ax, word ptr ds:[bx]
assume ds : data2 mov bx, offset msg ; bx = 000Ch, start of 'msg' in data2 mov al, byte ptr [bx] ; al = 'h' ( ds:000Ch) mov bx, offset TABLE ; bx = 0000h, start of TABLE in data2 mov ax, word ptr [bx] ; ax = 10 ;Code example to sum contents of integer array TABLE assume DS : data2 mov ax, seg data2 mov ds, ax ; immediate, load ds to segment data2 mov ax, 0 ; immediate, clear ax mov cx, TBL ; number of elements in TABLE mov bx, offset TABLE ; point bx to TABLE clc ; clear carry flag lp1: adc ax, word ptr ds:[bx] inc bx ; point bx to next element in TABLE inc bx dec cx ; decrement element count jne lp1 ; add all elements ; ax now has element total (Ref test2.asm)
Based
EA = Segment Register + Base register + Constant
eg. mov ax, word ptr ds:[bx + 2]
assume ds : data2 mov ax, seg data2 mov ds, ax mov bx, offset TABLE ; bx = 0000h mov ax, word ptr [bx+2] ; ax = [DS:BX+2] = 20 (Ref test3.asm)
Indexed
EA = Segment Register + Constant + Index register
eg. mov ax, word ptr ds:TABLE[si]
assume ds : data2 mov ax, seg data2 mov ds, ax mov si, 0 ; clear si mov ax, 0 ; clear sum count mov cx, TBL ; number of elements in TABLE clc ; clear carry flag lp1: adc ax, word ptr ds:TABLE[si] inc si ; point si to next element in TABLE inc si dec cx ; decrement element count jne lp1 ; add all elements ; ax now has element total (Ref test4.asm)
In this instance, the location of TABLE is known at compile time.
Based Indexed
EA = Segment register + Base register + Index
Register
eg. mov ax, word ptr ds:[bx+si]
assume ds : data2 mov ax, seg data2 mov ds, ax mov si, 0 ; clear si mov ax, 0 ; clear sum count mov bx, offset TABLE ; point bx to TABLE of integers mov cx, TBL ; number of elements in TABLE clc ; clear carry flag lp1: adc ax, word ptr ds:[bx+si] inc si ; point si to next element in TABLE inc si dec cx ; decrement element count jne lp1 ; add all elements ; ax now has element total (Ref test5.asm)
In this instance, the location of TABLE is not known at compile time, thus is loaded into a base register (bx) when the program runs.
Scaled Indexed
(386 only)
EA = Segment Register + Constant Offset + Index Reg * Scale
eg. mov ax, word ptr ds:TABLE[esi*2]
assume ds : data2 mov ax, seg data2 mov ds, ax mov esi, 0 ; clear si mov ax, 0 ; clear sum count mov cx, TBL ; number of elements in TABLE clc ; clear carry flag lp1: adc ax, word ptr ds:TABLE[esi*2] inc esi ; point si to next element in TABLE dec cx ; decrement element count jne lp1 ; add all elements ; ax now has element total (Ref test6.asm)
In this instance, the location of TABLE is known at compile time, and the element size is also known (2=integers, 4=floats).
Based Scaled Indexed
(386 only)
EA = Segment register + Base register + Index Reg * Scale
eg. mov ax, word ptr ds:[ebx+esi*2]
assume ds : data2 mov ax, seg data2 mov ds, ax mov esi, 0 ; clear si mov ax, 0 ; clear sum count mov ebx, offset TABLE mov cx, TBL ; number of elements in TABLE clc ; clear carry flag lp1: adc ax, word ptr ds:[ebx+esi*2] inc esi ; point si to next element in TABLE dec cx ; decrement element count jne lp1 ; add all elements ; ax now has element total (Ref test7.asm)
In this instance, the location of TABLE is not known at compile time, therefor is loaded into a base register, and the element size is also known (2=integers, 4=floats).
Based Indexed Plus Displacement
EA = Segment register +
Constant + Base register + Index reg
eg. mov ax, word ptr ds:JOHN[esi + ebx]
assume ds : data3 mov ax, seg data3 mov ds, ax mov ebx, offset Sue[Addr] ; point base reg to Sues address mov esi, 0 ; offset 0 in address field mov cx, 20 ; 20 characters to print lp1: mov al, byte ptr ds:Clients[esi+ebx] ; relative to Clients call display_char ; display character inc esi ; point to next character dec cx ; decrement character count jne lp1 ; loop while cx not zero (Ref test8.asm)
Based Scaled Indexed Plus Displacement
EA = Segment register
+ Constant + Base register + Index reg * Scale
eg. mov ax, word ptr ds:JOHN[esi * 2 + ebx]
Client Struc CName db 20 dup (' ') Addr db 20 dup (' ') Age dw ? Accnt dw ? ; 4 accounts, array of integer dw ? dw ? dw ? Client ends data3 segment para use16 public 'data' Clients Label Word John Client <'John Bloggs ','34 Long Grove ',37, 0, 5, 10, 15> Sue Client <'Sue Appleyard ','65 Willows Lane ',34, 2, 4, 14, 18> Msg db 'Sues total is ','$' Buff db 6 dup (' ') eostr db '$' data3 ends CSEG segment word use16 'code' assume cs:CSEG,ds:data3 public start start: mov ax, seg data3 mov ds, ax mov ebx, offset Sue[Accnt] ; base register = accounts mov esi, 0 ; index register points to first accnt mov cx, 4 ; there are four accounts mov ax, 0 ; total of accounts = 0 clc ; clear carry flag lp1: adc ax, word ptr ds:Clients[esi*2+ebx] inc esi ; next account dec cx ; decrement account numbers jne lp1 ; continue for all accounts call disp_int ; display total (Ref test9.asm)
Advantages of Protected Mode
Setting Up For Protected Mode Operation
The 80386 powers up in REAL
MODE (just like an 8086). To set up the processor for protected mode operation,
perform the following,
1. Set up descriptor table entries for GDT, IDT and if required, LDT
2. Load GDTR and IDTR with the table base address
3. Enable protected mode, setting the PE bit in CR0
4. Do an intra-segment jump to load CS and flush the instruction queue
5. Load the data segment registers with selector values
All segment registers continue to hold the values they did in REAL MODE, pointing to the same 64kb address spaces. They remain as "real mode" selectors till their contents are altered.
When a segment register is loaded, the selector is loaded from the GDT (or LDT) and cached internally.
Returning From Protected Mode Operation
1. Set up descriptor table
entries that look like 'Real mode' segments
Attribute Value Limit FFFF Granularity 0 Expan dir 0 Write 1 Present 1
2. Load these descriptors into DS, ES, FS, GS and SS
3. Set up a descriptor table entry for a real mode code segment (W=0)
4. Set PE =0 in CR0 to disable protected mode
5. Execute a FAR JUMP to the next instruction to flush the instruction queue
HARDWARE RESET
Upon reset the 386 tests the internal registers and
logic. Register EAX should hold zero if this test was successful. The registers
are initialised as follows,
EFLAGS 02h CR0 0 CS F000h IP FFF0h DS-GS 0 DX Component and ID revision number
The 386 will execute the first instruction at FFFFFFF0h in real mode. Upon the first intra-segment jump or call, A20-A31 will be driven low, so that the 386 can only access the 1st megabyte of memory (like an 8088).
The self test is performed by taking reset low when BUSY is also low. Testing takes 2**19 clocks or 26 milliseconds @ 20mhz. The 386 is reset by holding RESET low for at least 15 clocks, and a further 80 clocks if the self test is to be performed.
CO-PROCESSOR INTERFACE
The processor provides a co-processor
interface by use of the signals,
PEREQ BUSY ERROR
which are inputs to the processor. When the 386 starts a co-processor instruction (which all begin with F), it test the BUSY and ERROR inputs. The co-processor then executes the instruction, driving BUSY low till it has finished, then it drives PEREQ to signal the 386 that the co-processor has finished.
The 80387 is located at ports 800000f8-ffh, and comprises two ports, a command/status, and a data port. The 386 can continue processing, but if there is a need for the program to wait till the co-processor has finished the instruction, the WAIT instruction can be used (it samples the BUSY line).
In testing for a co-processor, the only instructions to use are
FINIT, FNINIT FSTSW mem FSTSW ax
The general procedure is to initialise the co-processor using FINIT, then use FSTSW ax which dumps the 387 status into register ax. The value in ax then indicates a co-processor or not.
Floating Point Instruction Approximate Execution Time (µS)
8087 @ 8 MHz 8086 S/W Add/Subtract 10.6 1000 Multiply (Single precision) 11.9 1000 Multiply (Extended precision) 16.9 1312.5 Divide 24.4 2000 Compare 5.6 812.5 Square root 22.5 12250 Tangent 56.3 8125 Exponentiation 62.5 10687.5
Testing for the presence of the 8087/287 Co-processor
; Source code cr equ 0dh lf equ 0ah assume cs : code, ds : data code segment public start: mov ax, seg data mov ds, ax fninit xor ah, ah mov byte ptr control+1, ah fnstcw control mov ah, byte ptr control+1 cmp ah, 03h jne no_coproc coproc: mov dx, offset msg_yes jmp print_msg no_coproc: mov dx, offset msg_no print_msg: mov ah, 09h int 21h done: mov ah, 4ch int 21h code ends data segment public control dw msg_yes db cr, lf, 'System has an 8087/287', cr, lf, '$' msg_no db cr, lf, 'System does NOT have 8087/80287', cr, lf,'$' data ends end start
80386 and the 80387 Co-Processor
The 80386 talks to the 80387 via 2
I/O ports. When it encounters a floating point instruction, it copies the
instruction to the command port at 800000F8h (first five bits of the instruction
are 11011).
The 80386 then fetches the next instruction and continues.
If the 80387 requires more information, eg, an operand from memory, it asserts PEREQ (Processor extension request). At the next break between instructions, the 80386 reads the 80387 status port to determine what the 80387 requires.
If this involves a transfer of operands, the 80386 reads or writes the data between memory and the 80387's data port at 800000FCh. This continues till the 80387 releases PEREQ.
Whilst the 80387 is executing its instructions, it drives BUSY to inform the 80386 that it cannot accept new instructions. The 80386 is thus prevented from loading new instructions till BUSY is released by the 80387.
Tricks with the 80387
Test and Compare
FSTSW AX ;store status in AX SAHF ;store AH in flags ;now use conditional jump or set instructions!!!!! Bit 386 387 6 ZF C3 2 PF C2 0 CF C0
MUST insert a WAIT between 80387 instructions that write an operand to memory and an 80386 instruction that tries to read it.
;Source code ; coproc.asm a software example to multiply two numbers together PAGE 55, 132 TITLE COPROC .8087 data segment para public 'data' hm1 dd 10000 hm2 dd 10 hm3 dd ? hm4 dd ? data ends cseg segment para public 'CODE' assume cs:cseg, ds:data public start start: mov ax, seg data mov ds, ax finit fild hm2 fimul hm1 fist hm3 fst hm4 mov ah, 4ch int 21h cseg ends end start
BUS LOCKING AND READ-MODIFY WRITE CYCLES
Some instructions
performed by processors involve reading a memory location into the processor,
altering it in some way, then writing the new value to the same memory location.
Simple examples of such instructions are,
inc [bx] or 0100, 20h
Two separate memory transfers are required for the one instruction, one memory transfer to read the original value, the other memory cycle to write the new value back.
This is called a read-modify-write cycle.
The programmer thinks of this as one instruction, yet the processor treats it as two separate and distinct memory cycles. This causes problems with multi-processor systems and real-time interrupt devices.
For example, because the processor treats the instruction as two separate memory cycles, it will allow other processors to take over the bus, or respond to an external interrupt, after the first memory cycle and before the second (remember its the one instruction).
Consider the following senario for the processor executing the instruction
xor status, 20h
After the first initial read, the processor responds to an interrupt which tests the value of the software variable 'status'. At this point, the interrupt routine works on the original value, as the processor has not yet completed the instruction.
To overcome this problem, the 386 processor provides a LOCK instruction prefix for certain instructions. This prevents other devices access to the memory subsystem till the instruction is completed.
Read-modify-write instructions are often used to implement semaphores. A binary semaphore has two values, which indicate whether access is allowed or prohibited.
Using the BTS instruction, the 386 reads the semaphore bit and stores its state in the Carry flag. The semaphore is then set and written back to memory. A subsequent test of the Carry flag state indicates whether or not the semaphore was originally set.
If the Carry flag is not set, its okay to access the resource, and the semaphore has already been set to indicate this.
If the Carry flag is set, you need to wait, and since it was originally set anyway, resetting it with the BTS instruction did nit change its original value.
; Source code asm file TITLE CPUID DOSSEG .model small .stack 100h .data fp_status dw ? id_mess db "This system has a$" fp_8087 db " and an 8087 math coprocessor$" fp_80287 db " and an i287tm math coprocessor$" fp_80387 db " and an i387tm math coprocessor$" c8086 db "n 8086/8088 microprocessor$" c286 db "n 80286 microprocessor$" c386 db " i386tm microprocessor$" c486 db " i486tm DX microprocessor or i487tm SX math coprocessor$" c486nfp db " i486tm SX microprocessor$" period db ".$",13,10 present_86 dw 0 present_286 dw 0 present_386 dw 0 present_486 dw 0 ; ; The purpose of this code is to allow the user the ability to identify the processor and coprocessor ; that is currently in the system. The algorithm of the program is to first determine the processor ; id. When that is accomplished, the program continues to then identify whether a coprocessor ; exists in the system. If a coprocessor or integrated coprocessor exists, the program will identify ; the coprocessor id. If one does not exist, the program then terminates. ; .code start: mov ax,@data mov ds,ax ; set segment register mov dx,offset id_mess ; print header message mov ah,9h int 21h ; ; 8086 CPU check ; Bits 12-15 are always set on the 8086 processor. ; pushf ; save EFLAGS pop bx ; store EFLAGS in BX mov ax,0fffh ; clear bits 12-15 and ax,bx ; in EFLAGS push ax ; store new EFLAGS value on stack popf ; replace current EFLAGS value pushf ; set new EFLAGS pop ax ; store new EFLAGS in AX and ax,0f000h ; if bits 12-15 are set, then CPU cmp ax,0f000h ; is an 8086/8088 mov dx,offset c8086 ; store 8086/8088 message mov present_86,1 ; turn on 8086/8088 flag je check_fpu ; if CPU is 8086/8088, check for 8087 ; ; 80286 CPU check ; Bits 12-15 are always clear on the 80286 processor. ; or bx,0f000h ; try to set bits 12-15 push bx popf pushf pop ax and ax,0f000h ; if bits 12-15 are cleared, then CPU mov dx,offset c286 ; is an 80286 mov present_86,0 ; turn off 8086/8088 flag mov present_286,1 ; turn on 80286 flag jz check_fpu ; if CPU is 80286, check for 80287 ; ; i386 CPU check ; The AC bit, bit #18, is a new bit introduced in the EFLAGS register on the i486 DX CPU to ; generate alignment faults. This bit can be set on the i486 DX CPU, but not on the i386 CPU. ; mov bx,sp ; save current stack pointer to align it and sp,not 3 ; align stack to avoid AC fault db 66h pushf ; push original EFLAGS db 66h pop ax; ;get original EFLAGS db 66h mov cx,ax ; save original EFLAGS db 66h ; xor EAX,40000h xor ax,0 ; flip AC bit in EFLAGS dw 4 ; upper 16-bits of xor constant db 66h push ax ; save for EFLAGS db 66h popf ; copy to EFLAGS db 66h pushf ; push EFLAGS db 66h pop ax ; get new EFLAGS value db 66h xor ax,cx ; if AC bit cannot be changed, CPU is mov dx,offset c386 ; store i386 message mov present_286,0 ; turn off 80286 flag mov present_386,1 ; turn on i386 flag je check_fpu ; if CPU is i386, now check for ; 80287/80387 MCP ; ; i486 DX CPU / i487 SX MCP and i486 SX CPU checking ; mov dx,offset c486nfp ; store 486NFP message mov present_386,0 ; turn off i386 flag mov present_486,1 ; turn on i486 flag ; ; Co-processor checking begins here for the 8086/80286/i386 CPUs. ; The algorithm is to determine whether or not the floating-point status and control words can be ; written to. If they are not, no coprocessor exists. If the status and control words can be written ; to, the correct coprocessor is then determined depending on the processor id. Coprocessor ; checks are first performed for an 8086, 80286 and a i486 DX CPU. If the coprocessor id is still ; undetermined, the system must contain a i386 CPU. The i386 CPU may work with either ; an 80287 or an 80387. The infinity of the coprocessor must be checked to determine the correct ; coprocessor id. ; check_fpu: ; check for 8087/80287/80387 fninit ; reset FP status word mov fp_status,5a5ah ; initialize temp word to non-zero value fnstsw fp_status ; save FP status word mov ax,fp_status ; check FP status word cmp al,0 ; see if correct status with written jne print_one ; jump if not Valid, no NPX installed fnstcw fp_status ; save FP control word mov ax,fp_status ; check FP control word and ax,103fh ; see if selected parts looks OK cmp ax,3fh ; check that ones and zeroes correctly read jne print_one ; jump if not Valid, no NPX installed cmp present_486,1 ; check if i486 flag is on je is_486 ; if so, jump to print 486 message jmp not_486 ; else continue with 386 checking is_486: mov dx,offset c486 ; store i486 message jmp print_one not_486: cmp present_386,1 ; check if i386 flag is on jne print_87_287 ; if i386 flag not on, check NPX for ; 8086/8088/80286 mov ah,9h ; print out i386 CPU ID first int 21h ; ; 80287/80387 check for the i386 CPU ; fld1 ; must use default control from FNINIT fldz ; form infinity fdiv ; 8087/80287 says +inf = -inf fld st ; form negative infinity fchs ; 80387 says +inf <> -inf fcompp ; see if they are the same and remove them fstsw fp_status ; look at status from FCOMPP mov ax,fp_status mov dx,offset fp_80287 ; store 80287 message sahf ; see if infinities matched jz restore_EFLAGS ; jump if 8087/80287 is present mov dx,offset fp_80387 ; store 80387 message restore_EFLAGS: mov ah,9h ; print NPX message int 21h db 66h push cx ; push ECX db 66h popf ; restore original EFLAGS register mov sp,bx ; restore original stack pointer jmp exit print_one: mov ah,9h ; print out CPU ID with no NPX int 21h jmp exit print_87_287: mov ah,9h ; print out 8086/8088/80286 first int 21h cmp present_86,1 ; if 8086/8088 flag is on mov dx,offset fp_8087 ; store 8087 message je print_fpu mov dx,offset fp_80287 ; else CPU=80286, store 80287 message print_fpu: mov ah,9h ; print out NPX int 21h jmp exit exit: mov dx,offset period ; print out a period to end message mov ah,9h int 21h mov ax,4c00h ; terminate program int 21h end start
Borland pascal example for testing for 286
Copyright Brian Brown, 1991-2000, All rights reserved.